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Abstract. We extend a low-rate improvement of the random coding bound on the reliability of 
a classical discrete memory less channel to its quantum counterpart. The key observation that we 
make is that the problem of bounding below the error exponent for a quantum channel relying on 
the class of stabilizer codes is equivalent to the problem of deriving error exponents for a certain 
symmetric classical channel. 



1. Introduction 

Derivation of error bounds in quantum information theory is usually performed by translation 
of the standard methods from its classical counterpart. Error exponents for the classical-quantum 



channel (transmission of orthogonal states) were derived in [10|. Here we are concerned with the so- 
called quantum- quantum channel which is the standard universe for quantum error-correcting codes. 
An exponential upper bound on the distortion (error) probability was derived in a recent paper ||. 
Here we show that this bound can be improved for low noise and low values of the transmission 
rate. In Sect. |||we give precise definitions of the quantum discrete memoryless channel (henceforth 
QDMC), codes, decoding, and error probability. Sect. |3| contains a brief review of stabilizer codes 
and their decoding. It turns out that if we restrict ourselves to the the class of stabilizer codes, 
then the bounds on their distortion exponent also follow from the corresponding classical results. 
In particular, in Sect. || we give a short proof of the result of ||]. The link to the classical results 
motivates us to derive a low-rate error exponent for a QDMC (Sect. ||). A condition when it 
improves the random coding bound of Q is given. We conclude by specializing the results to the 
case of a depolarizing channel and showing a concrete improvement for low code rates in the case 
of low noise. 



2. Preliminaries 

A quantum d-ary digit, a qudit, is a d-dimensional complex space H = C d , where d will be 
assumed a prime number. Below by X we denote the finite field ¥ q , where q = d 2 . We consider 
transmission of unit-length state vectors from the d n -dimensional space H n = H® n . Let us fix 
some orthonormal basis of H and write it as (|0), |1), . . . , \d— 1)). A unitary basis of error operators 
(an error basis, for short) is defined as {Eij = X l Z J ,i,j £ F^}, 

X\i) = \(i-l)modd), Z\j)=(J\j), 

and w is a primitive dth root of unity. 

A quantum discrete memoryless channel W is defined as an arbitrary collection of operators of 
the form (A u ,u G X), where 

A u — ^ a uv E v 
vex 
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and where the complex row vectors a v = (a uv ,u £ X) define a probability distribution on X given 
by 

W(v) = a v a* v (v e X), ^ W(v) = 1. 

V 

We note that this definition is derived from the general definition of the quantum channel $ which 
is a trace-preserving completely positive map on the set of density operators on H n . It is known 
that any such map $ can be written as 

*(S) = J2A k SA* k 

k 

for some set of operators A k , where S is a density operator on H n (the so-called Kraus representation 
of the channel). The absence of memory in the channel is reflected by the fact that the operators 
Ak can be written as tensor products of operators on H. 

As an example, let d = 2 and consider the so-called depolarizing channel W = — pi, \/p/3a x , 
^p/3a z , ^/p/'io-y}, where {o~ x ,a z ,o~ y ) is the set of Pauli matrices. This channel acts on qubits 
by phase flips, amplitude flips, or combinations of both applied with probability p/3 each. More 

generally, for any d we can define a depolarizing channel as follows: W = {^/l — pi; ^J-^-Eij, i,j E 

A quantum code Q is a linear subspace of H n . The rate of Q is defined as R = R(Q) := 
(\og d K)/n, where K is the dimension of Q. Let 7Z be a recovery operator, i.e., another completely 
positive trace-preserving map on H n , restricted to Q. The fidelity of the code Q for a given channel 
$ and a given recovery operator TZ equals 

F(Q,{$,K}) = ±mm^(mm)W\M), 

4><=B 

where the minimum is taken over all orthonormal bases B of the code. In particular, for the QDMC 
defined above, $ = *jf® n . Below we will omit the recovery operator from the notation. 
For a given rate R we wish to define the reliability (exponent) of a QDMC W. Let 

E{n,R,W)= sup -i\ ogd (l - F(Q,W)) 

QCH n :R(Q)>R ™ 

be the error exponent for the rate R and code length n. Let 

E(R,W) = liminf E(n,R,W). 

n— >oo 

Let H m (Q) = ~Ylx£xQ( x )^ &mQ( x ) be the entropy of a probability distribution Q on X. 
For two probability distributions P and Q, their information divergence is given by D m (Q\\P) = 
^2 xeX Q( x ) l°g m T0j the base of the logarithms and exponents below is omitted, it is equal to 
d). 

The following theorem was proved in ||. 
Theorem 1. For any rate R > and any QDMC W 
(1) E{R,W) > E r (R,W) = m\n[D(V\\W) + \1-H(V) - R\ + ], 

where the minimum is taken with respect to all probability distributions on X and \a\ + := max(a, 0). 

Since E r (R, W) > for < R < 1 - H(W), this result also implies a lower bound of 1 - H(W) 
on the capacity of the channel W . 

Given a vector x G X n , we can define an empirical probability distribution P on X given by 
P(u) = \{i : Xi = u}\/n, u G X. Below we call it the type of the vector x and write T(x) = P. The 
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type of the all-zero vector will be denoted by Pq; we have Po(u) = <5 u - The set of all sequences of 
a given type P will be denoted as T p(X n ). It is clear that 

\Tp(X n )\=^ Vq (n(H q (P) + o{\))). 

Let V(X n ) be the set of all types on X n . Obviously, 

\V(X n )\= (^J^ 1 ) <n q (n,q>2). 

For any x G X n and any stochastic matrix V : X — > y, the V- shell of x is defined as the set 
Ty(x) C y n formed by those y whose conditional type is V . This means that for any such y its type 
is T(y) = PV, where PV is the probability distribution on y given by PV(y) = Y^xex P{x)V(y\x). 



3. Stabilizer codes and their decoding 

The construction of stabilizer quantum codes in ||, H is as follows. Consider the vector space 
V n = (¥ d x¥ d ) n . Write a typical vector x G V n ) and consider a standard 

symplectic form on V n defined by 

n 



(x,y) = 



x iUi x iUi- 



Now let C C X n be an additive code, i.e., an additive subgroup of X n and define C 1 - as the set of 
vectors in (F+) ra = V n that are ( , )-orthogonal to every vector in C. Suppose that the number of 
vectors in C is q k so that the rate of C equals R{C) = k/n. We then have \C^\ = q n ~ k . 
We begin with a pair of codes C x cCc X n and a set £ C X n such that 



Vx,y££(y ~ X G C) 



(X 



y)- 



According to this definition, we can take at most one error vector per coset of X n /C and therefore, 
the maximum size of the set £ equals q n ~ k . It is possible to construct a quantum code Q C H n 
of (complex) dimension cP k ~ n which is an invariant subspace of the set of error operators Ng = 
{N x ,x G £} given by 



No: 



where for every i the operator JV^ = E Xil}Xi2 is an element of the error basis determined by the 
representation of the coordinate xi G ¥ g of x as a pair of elements (a^ijX^) G (Fd) 2 ■ Moreover, 
there are d 2 ( n_fc ) such invariant subspaces whose orthogonal direct sum equals H n . Thus the rate 
R of the stabilizer code Q is related to the rate of C as R = 2R(C) — 1. 

A stabilizer code Q is <?-error-correcting in the sense that the action of any error operator from 
the set Ng can be removed from the transmitted state. The received state w is measured with 
respect to the set of pairwise orthogonal operators Pi, each being an orthogonal projector on the 
subspace of H n that corresponds to a coset of X n /C. Then within this coset we find one of the most 
probable error vectors and recover the transmitted state by applying the inverse error operator. 

The following bound on the fidelity of a given stabilizer code Q was proved in M based on a 



result in 12 



Theorem 2. || Let Q be an £- error- correcting stabilizer code used over a QDMC W . Then 

1-F(Q,W) < Yw n {x). 

x^g 
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This theorem provides a link between the quantum and the classical setting which will be pivotal 
in our argument. 

Note that there is substantial freedom in the choice of the error set £. To derive our result, we 
will take £ as follows. As pointed out above, the channel W defines a probability distribution W 
on X. For an additive code C consider the quotient space X n /C. From each coset S we take one 
of the vectors y = y(S) whose probability W n {y) = \\W(y.j) is the largest in S. Finally, we take 
£ = U s y(S). 

We conclude this section by deriving a general analog of the weight distribution and of the 
Gilbert- Varshamov bound for additive codes over X. For q = 4 and the Hamming weight distribu- 
tion this result was proved in Q. 

Theorem 3. For any rate R(C) > and any 5 > 0, there exists an additive code C C X n of size 
exp(ni?(C)) such that C 1 - C C and for any type P ^ Pq, 

(2) \C n J P \ < exp q [n(R(C) + H q (P) -1 + 5)]. 

In particular, for any x G C\{0} with T(x) = P we have 

R(C) > 1 - H q {P) - 5. 

Proof. Let 

S n>k = {CeX n :log q \C\ = k,C ± czC}. 

It was observed several times in the literature (e.g., Q, Q) that every vector x G <Y n \{0} is 
contained in the same number of codes in S n k- Denote this number by B and let S = ("S^fel- 
Counting in two ways the sum of sizes of all the codes in S ni k we obtain (q n — 1)B = (q k — l)S. Let 
us fix a type P. Clearly, 

\T P r\<n«q nH ^ p \ 

P'EV n (X): H q {P')<H q (P) 

Thus as long as n q q nHq ^ B < S or 

n 1 q nH t (P) < <t_Z± = „(l-fl(C)) (1 + o(1))) 
q k — 1 

there exists a code C £ S n ^ such that for every x G C\{0} we have H q {T{x)) > H q (P). This proves 
the last part of the claim. 

For any P ^ Pq the average number of code vectors of type P in a code C G S n ^ equals 

i £ \{ x e(CnT P )}\ = ^- = ex Pq [n(R(C) + H q (P) -l + o(l))}. 

Since there are no more than n q different types, this proves the first part of the claim. □ 

4. The random coding bound 

Let X be an input and y an output alphabet of a classical DMC given by a stochastic matrix 
W(y\x). Suppose that X C y and that y is an abelian group, written additively. A channel 
is called additive if W(y\x) depends only on the difference y — x, i.e., W(y\x) = W(y — x) (the 
last term is actually W(y — x\0), but below we abuse the notation slightly and use unconditional 
distributions). Note that an additive channel W is symmetric in the sense that every row is a 
permutation of a fixed probability vector, and the same is true with respect to every column. By 
Theorem ^ the problem of bounding from below the reliability exponent of a QDMC is now reduced 
to the corresponding classical problem for a symmetric, additive DMC with y = X. With this 
observation Theorem [l| follows by a combination of standard arguments; so having in mind the 
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reader well familiar with error exponents of classical channels we could as well stop here. In the 
interest of staying self-contained we will supply some more details. 

A. General form of the random CODING exponent. For any type P £ V(X n ) and any 
stochastic \X\ x \y\ matrix V, let 

D(V\\W\P) = ]T P{x)V(y\x) log 
be the conditional divergence and 

be the mutual information between x £ T p(X n ) and y £ Ty(x). The following theorem (reformu- 
lated slightly from Q) gives one of the general forms of the error exponent of a classical DMC. 

Theorem 4. For a given type P £ V(X n ) let A C Jp(X n ), \A\ = S R '~^ n be a code such that for 
every stochastic matrix V : X —* X 

(3) \{( Xi , Xj )£AxA: Xj £ J v ( Xl )}\ < exp[n(R' - I(P, V))]. 

Suppose that A is used over a DMC W : X — > y together with a maximum mutual informa- 
tion decoder. Then the exponent E{A, W) of the maximum error probability m&x x£ APe satisfies 
E(A, W) > E r (P, R',W), where 

(4) E r (P,R',W) = mm[D(V\\W\P) + \I(P,V) - R'\ + ], 
and where V runs over the set of all channels X — > y. 

Remarks. 1. This theorem is a generalization of a classical fact of coding theory, that "binary 
linear codes of rate R and weight distribution A w < 2 n ^ R ~ 1 > ( n ),w = 1,2, ... ,n achieve the random 
coding exponent of the binary symmetric channel." 

2. The best bound on the reliability exponent of the channel W is obtained by computing the 
maximum on P in (Q). The quantity E(R',W) = maxp E r (P, R', W) is usually called the random 
coding exponent of W. 

3. The maximum mutual information decoder, which is used to prove this result and which was 
employed in ||, is different from the decoder defined in Sect. ||[ 

B. Additive channels and codes. Recall that in our problem X is an additive group and 
that y = X. Further, since the channel W is symmetric, the maximizing input distribution P in 
(^|) is known to be uniform |7): P u (x) = {X^ 1 for any x £ X. 

Let us substitute P u into the condition (Q) on the "distance distribution" of the code A. Let x 
be a vector such that T{x) = P u and let V be a stochastic matrix such that Ty(x) D Tp u (X n ) ^ 0. 
Then for any letter x £ X the sum ^z/eA' V{v\ x ) = !• We compute 

I(P U ,V) = log |;f| -H(V\P U ), 
So the upper bound in (S) takes the form 

(5) \{(xi,Xj) £ Ax A : Xj £ Jy(xi)}\ < exp[n(R' + H(V\P U ) - log \X\)}. 

Now consider the code C from Theorem [| Almost all of its codewords are of type P u and nearby 
types (types close to it in some suitable metric, say, the £i-distance). We claim that the "distance 
distribution" of the code C satisfies (S). Since the code is additive, it suffices to consider matrices 
V such that V(y\x) depends only on the difference y — x. Any such matrix defines a distribution 
V(z) = V{z\$) on X. Using this in (|), we observe that this condition reduces to the condition (0) 
satisfied by the "weight" distribution of C. Now recall from || that the function E r (P, R' ,W) is 
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uniformly continuous on P and that, on account of the channel and code being additive, the error 
probability of decoding does not depend on the transmitted codeword. Therefore for growing n the 
error exponent of the code C attains the bound E(R', W). This proves Theorem [p. 

Transforming the exponent (|j) to the form ([[]) is a matter of calculation. Indeed, let us substitute 
P u in (H|). Clearly, D(V||W|.P) = Z)(V||W), where on the right-hand side V and W are probability 
distributions on X given by W(z) = W (y\x) , V (z) = V(y\x) for any y,x such that z = x — y. 
Further, 

J(P, V) - R' = -\X\~ l H(V) + log \X\ - 1 - R = 1 - R - H(V), 

where we have used the relation R' = 2R(C) = 1 + R. 
Further observations. 

1. By the same token, the capacity of the quantum channel W is bounded below by the capacity 
of the classical symmetric channel W. Again the mutual information is maximized for the uniform 
input distribution, which implies the bound ^ > 1 — H(W) independently of the results on error 
exponents. Note however that when this result is specialized to the depolarizing channel (see 
Example in the next section) , it falls below the best currently known estimate of || . 

2. If we return from (|4j) to Gallager's original form of the random coding bound (by a method 
outlined in || pp. 192-193]), the exponent (H) can be written in a somewhat more convenient form. 
Namely: 

Theorem 5. Let E (p, W) = p - (1 + p) log J2 x ex W{x)^~p . Then 

E r (R, W) = 1 - R- log ( VW&)) 2 (0<R< f?| P =i) 

and 

E r (R, W) = maxj-pi? + E (p, W)] (^f| P =i < R < 1 - H{W)). 

3. In the classical setting, the line of thought realized in Theorem [l] would correspond to an 
attempt to prove error bounds for a general DMC relying on the class of additive codes. It is 
well known @, [|| that this approach produces good results only when the optimizing probability 
distribution on the input alphabet is uniform. The classical channel derived from a general QDMC 
for stabilizer codes turns out to be additive and hence symmetric. Hence the lower bounds on the 
reliability exponent thus obtained are arguably rather strong. 



5. Expurgation exponent for a QDMC 

Let Q C H n be a stabilizer code of rate R = R{Q) used over a QDMC W together with the 
decoder defined in Sect. ||. Define the W- weight of a letter x G X as 

\x\ w = - log VW(e)W(e-x), 

where logO = — oo by definition. 
Theorem 6. 

E{R,W) > E X {R„W) = ^mm^ P(x)\x\ w - (R + H(P)-1) 
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Proof. We start with the code C whose existence is proved in Theorem y. Let Q be the stabilizer 
quantum code associated with it. By Theorem |2| 

l-F(Q,W) = Y,W n (e)= £ Yl W ^y) 

egs xec\{0} vex" 

< E E Vw n (y)w n (y - x) 

x-ec\{o} y&x n 

= E E E V^ n (y)^"(y - *) 

Pe-PCX'^xGCnTpCA" 1 ) y 
< ^ exp d [2n(i?(C) + ^(P)-l + o(l))-n^P(x)|x| w -], 

P6P(X n ) x&X 

where the last step follows because the channel is memoryless. Conclude by computing the loga- 
rithm and substituting the relation 2R(C) = 1 + R. □ 

Note that it is possible that E X (R,W) becomes infinite for R j R OD (W) > 0, which means 
that for rates R < Roo{^) errors outside the set £ occur with probability zero. The quantity 
Roo{^) gives a lower bound on the zero-error capacity of the channel W . Shannon's classical 
example of a channel with R oc (W) > [||, p. 532] is given by the additive channel with X = Z5 and 
W(x) = W(x + 1) = 1/2. Clearly, R^W) > if and only if \x\ w = for some x £ X. A channel 
is called indivisible if this condition does not hold, and hence R OD (W) = 0. 

The function E x can be transformed to a different form, also due to Gallager ||: 

E X (R, W) = sup[-pi? + E ex (p, W)], 



where 

d 2 



E ex (P, W) = -p \og d E ( E VW(e)W(e + x)] 



Let us state a condition for the bound E ex (R,W) to improve the result of Theorem [j]. As 
remarked above, the optimizing probability distribution on X for the random coding bound (||) in 
our case is uniform. Moreover, the exponent E x is also derived under the same assumption. It is 
known § that for one and the same input distribution and for code rates R < dE ex (p, W)/dp\ p= \ 
the function E X (R, W) is greater than E r (R, W), so in this region of rates Theorem ^ improves the 
result of Theorem |l]. Hence if the point R x = dE ex (p,W)/dp\ p= i > then there is a nonempty 
interval of code rates where E X (R, W) > E r (R, W). Note that typically such an interval exists only 
for low noise level in the channel. To make an analogy with the classical case, the improvement 
takes place if the value of the code rate R(C) that corresponds to R x is greater than 1/2. In the 
range where it improves the bound (|), the exponent E X (R,W) can be written as 

(6) E X (R,W)= min E|X| W , 

P:H(P)=1-R 

where X is a random variable on X distributed according to P. This follows by the Gilbert- 
Varshamov bound of Theorem ||. 

Remark. The general form of the function E X (R, W) for a given additive, indivisible channel W 
is as follows: 

E X (R, W) = maxsupf-^P + E ex (p, P, W)], 
p P >i 
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Figure 1. Error exponents for the depolarizing channel with d = 2 and p = 0.0005. 
For < R < R x the function E x gives a stronger bound than E r . 



where 



E ex {p,P,W) = -p\og d P(x)P(.x') ( Yl ^ W ^ x ~ e ^ W ( x ' ~ e ) 



Optimization on the input distribution P in this expression is easy if the q x q matrix 



[($2 yJW(x-e)W{x' -e)) l ' p ] 



is nonnegative definite for every p > 1 [11|, and turns into a difficult problem otherwise. For the 
channel to be nonnegative definite it is sufficient that for every pair of distinct vectors (x, x') the 
sum on e in the expression for E x (p, P, W) takes one and the same value (the so-called equidistant 
channels ) . For equidistant channels the maximum on P is achieved for the uniform distribution 
P{x) = l/q,x € X. For instance, the d-ary depolarizing channel is equidistant. However, there are 
many examples of not nonnegative definite additive, indivisible channels. For instance, let d = 3. 
Consider the channel given by the following probability distribution: 



where u £ (Fg)~ 
p > 1.37. 



u 00 01 02 10 11 12 20 21 22 
W(u) 0.49 0.01 0.01 0.49 ' 

^3 x Z3. It is easily verified that this channel is not nonnegative definite for 

□ 



Example. Let us specialize the results of Theorems |l] and H for the case of the ci-ary depolarizing 
channel W . Let us denote the reliability exponent of W by E(R,p). The result can be expressed in 
a closed form. Let 

x 

h(x) = -a; log - (1 - x) log„ x 

H q — 1 

x 1 — x 

D(x\\y) = xlog - + (1 -a;) log 

y H 1 - V 

S ( x ) = h-\l-x). 

We have 



E{R(Q),W) > 2E e {(l + R)/2,p), 
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where 

(7) E e (r,p) = S (r)log q j q (p) (0 < r < r x ) 

Ee(r,p) = D(p \\p) + r c - r (r x < r < r c ) 

E e (r,p) = D(S (r)\\p) (r c < r < 1 - h(p)), 



QPo 



r c = 1 - h(po), 




Vp(q - 1) , , 

This reliability exponent can be obtained from Theorems ||, ||| or computed directly starting with 
codes whose existence is proved in Theorem ||. The expurgation exponent (0) is straightforward 
from (D. If R x := 2r x -l> 0, then from (|) we obtain an improvement over the result of Theorem 
H in the interval of values of R between zero and R x . It turns out that this condition is satisfied for 
low noise level (see an example in Fig. |]). F° r d = 2 the expurgation bound improves the random 
coding exponent for < p < 0.004. □ 

Acknowledgment. The author is grateful to A. Ashikhmin and G. Kramer for helpful discussions. 
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